This document presents the results obtained for Colorado.


Subsetting by Affected Counties and Flood Phase

3858 tweets sent during the disaster were filtered based on the 9 counties affected in the disaster –according to the reports: Arapahoe, Boulder, Denver, El Paso, Jefferson, Larimer, Logan, Morgan and Weld.

Only 2576 tweets sent during the “Flood” stage and from these counties were filtered for content analysis.

Most common words (Within Affected Counties)

A quick view of the most common words in the whole dataset:

## # A tibble: 5,601 x 2
##    word             n
##    <chr>        <int>
##  1 boulder       1811
##  2 boulderflood   428
##  3 coflood        321
##  4 colorado       274
##  5 cowx           270
##  6 flood          155
##  7 rain           140
##  8 creek          121
##  9 amp            118
## 10 denver         107
## # … with 5,591 more rows

Again, since “boulder” is the most common word and is going to have a big effect in our topic modelling, it was removed from the dataset. The following four terms (“boulderflood”, “colorado”, “cowx”, “flood”) were also excluded because they were so common and used neutrally in all four stages. After excluding these six terms, the new list of common words looks as follows:

## # A tibble: 5,592 x 2
##    word         n
##    <chr>    <int>
##  1 flood      155
##  2 rain       140
##  3 creek      121
##  4 denver     107
##  5 flooding   106
##  6 day         87
##  7 water       71
##  8 people      70
##  9 time        68
## 10 park        67
## # … with 5,582 more rows


The statistic tf-idf was computed in order to identify which words are important in each of the flood stages. It measures how important a word is to a document in a collection (or corpus) of documents, in our case, it measures how important a word is to a tweet in a collection of tweets. In this case the collection of tweets was the set of tweets belonging to each stage.


## # A tibble: 566 x 6
##    stage              word                               n      tf   idf  tf_idf
##    <fct>              <fct>                          <int>   <dbl> <dbl>   <dbl>
##  1 flood              sirens___flood                    20 0.00426 1.39  0.00591
##  2 postflood          fitsocial___postflood             11 0.00421 1.39  0.00584
##  3 flood              flood___flood                     95 0.0202  0.288 0.00582
##  4 flood              stapleton___flood                 15 0.00320 1.39  0.00443
##  5 immediate_afterma… cofloodrelief___immediate_aft…    12 0.00299 1.39  0.00415
##  6 flood              flash___flood                     28 0.00597 0.693 0.00414
##  7 immediate_afterma… weld___immediate_aftermath        11 0.00274 1.39  0.00380
##  8 immediate_afterma… relief___immediate_aftermath      21 0.00524 0.693 0.00363
##  9 flood              flashflood___flood                12 0.00256 1.39  0.00355
## 10 immediate_afterma… flood___immediate_aftermath       48 0.0120  0.288 0.00344
## # … with 556 more rows
## [[1]]
## png 
##   2 
## 
## [[2]]
## png 
##   2 
## 
## [[3]]
## png 
##   2 
## 
## [[4]]
## png 
##   2


Topic Modeling